Contents

Last modified: 2021-05-14 13:29:20
Compiled: Fri May 14 13:29:54 2021

1 Introduction

1.1 Background

In the context of the semantic web, structured data refers to data that is written in a rule-based manner and has reference relationships between data. In other words, it is data that has been made more machine-readable by adding metadata to it. Typical examples of structured data or structured knowledge are ontologies and Linked Open Data such as wikidata and DBpedia. Structured data is often described by an RDF data model.

There is a need to understand the object/domain of interest through structured knowledge. However, the subject of interest is often described in terms of unstructured data, i.e., a vague range of data given as a list of text or vocabulary. Therefore, there is a huge gap in the mapping between these data.

Unless you are an expert in ontology and LOD, you will need to define your subject of interest clearly in the early stages and map it to the structured data. It is difficult to map them to structured data.

Therefore, to support the construction of the initial model of the structured data of the domain of interest In order to support this, a Based on a small lexical list of interest, a subset of We built a toolset for extracting a subset of the corresponding structured data To support the construction of an initial model of structured data for the domain of interest, we constructed a toolset to extract a subset of the corresponding structured data based on a small vocabulary list of interest.

Overview of the domain ontology construction.

Figure 1: Overview of the domain ontology construction

This tutorial will provide the procedure to obtain structured data from LOD as a real case study.

1.2 agGraphSearch package

The agGraphSearch package is a tool-set to support the construction of domain ontology. This package provides a methodology for extracting target domain concepts from a large-scale public Linked Open Data (LOD) system. In the proposed method, the class-related hierarchy of the domain concept by the occurrences of common upper-level entities and the chain of those path relationships is obtained. The proposed method was described in Figure 1.

Overview of the upper-level concept graph and analysis algorithm. The numbers in the nodes indicate the number of search entities that exist in the subordinate concepts.

Figure 2: Overview of the upper-level concept graph and analysis algorithm
The numbers in the nodes indicate the number of search entities that exist in the subordinate concepts.

As an example of class hierarchy extraction from LOD, this short tutorial provides a workflow to obtain and visualize conceptual hierarchies related to leukemia from wikidata endpoint using its some entity labels.

Overview of the workflow of the proposed method was descrived in Figure 2.

Overview of the workflow of the proposed method

Figure 3: Overview of the workflow of the proposed method

This result is similar to the network graph obtained with wikidata graph builder.

1.3 Getting started

Once agGraphSearch is installed, it can be loaded by the following command.

#install
if(!require("agGraphSearch")){
  install.packages( "devtools" )
  devtools::install_github( "kumeS/agGraphSearch" )
}

#load
library("agGraphSearch")

#GitHub URL
#browseURL("https://github.com/kumeS/agGraphSearch")

2 Workflow for searching the graph for leukemia.

2.2 SPARQL query (1) counting labels and class relations

Data model for the Wikidata class hierarchy

Figure 4: Data model for the Wikidata class hierarchy

In this tutorial, the data model for class hierarchies in Wikidata will be mainly focused. It is shown in Figure 3. The class hierarchy of Wikidata is represented using the properties of subClassOf (wdt:P279) and instanceOf (wdt:P31) as a conceptual relationship between entities. In addition, the Wikidata entities are represented by IDs called QIDs. In this tutorial, in addition to QIDs, we used the property relations of representative name (rdfs:label) and alias (skos:altLabel), which represent links to label information of QIDs.

2.2.1 Check SPARQL query

ter00 <- terms[1]

#check Query
CkeckQuery_agCount_Label_Num_Wikidata_P279_P31(Entity_Name = ter00)
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## ### 001 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?subject) as ?Count_As_Label)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en. 
## }
## ```````````````````````````````````````````
## ### 002 ###
## ```````````````````````````````````````````
## SELECT (count(distinct ?subject) as ?Count_As_AltLabel)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en. 
## }
## ```````````````````````````````````````````
## ### 003 ###
## ```````````````````````````````````````````
## SELECT  (count(distinct ?parentClass ) as ?Count_Of_ParentClass_Label)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en. 
## ?subject wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?parentClass ) as ?Count_Of_ParentClass_altLabel)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en. 
## ?subject wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## ### 004 ###
## ```````````````````````````````````````````
## SELECT  (count(distinct ?childClass ) as ?Count_Of_ChildClass_Label)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en. 
## ?childClass wdt:P279 ?subject.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?childClass ) as ?Count_Of_ChildClass_altLabel)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en. 
## ?childClass wdt:P279 ?subject.
## }
## ```````````````````````````````````````````
## ### 005 ###
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance ) as ?Count_InstanceOf_Label)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en. 
## ?subject wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance ) as ?Count_InstanceOf_altLabel)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en. 
## ?subject wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## ### 006 ###
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance ) as ?Count_Has_Instance_Label)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject rdfs:label "acute lymphocytic leukemia"@en. 
## ?instance wdt:P31 ?subject.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance ) as ?Count_Has_Instance_altLabel)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?subject skos:altLabel "acute lymphocytic leukemia"@en. 
## ?instance wdt:P31 ?subject.
## }
## ```````````````````````````````````````````
#Endpoint
agGraphSearch::KzLabEndPoint_Wikidata$EndPoint
#Graph id
agGraphSearch::KzLabEndPoint_Wikidata$FROM

#run SPARQL
#library(SPARQL)
res <- agCount_Label_Num_Wikidata_P279_P31(Entity_Name = ter00, 
                                           Dir="02_Short_Out")
res

#View table
#agTableDT(res, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)

2.2.2 Counting labels and class relations with a for-loop

This program executes SPARQL with a for-loop.

Inputs are 3 terms.

#create an empty variable
m <- c()

#Run
for(n in 1:length(terms)){
#message(n)
m[[n]] <-agCount_Label_Num_Wikidata_P279_P31(Entity_Name = terms[n],
                                             Dir="02_Short_Out")
}

#convert list to data.frame
(fm <- ListDF2DF(m))
##                                        LABEL Hit_Label Hit_ALL Hit_upClass_All
## 1                 acute lymphocytic leukemia         1       9               3
## 2              Chronic eosinophilic leukemia         1       3               2
## 3 philadelphia-positive myelogenous leukemia         1       1               1
##   Hit_downClass_All Hit_subClassOf Hit_InstanceOf Hit_subClassOf_ParentClass
## 1                 6              8              1                          2
## 2                 1              2              1                          1
## 3                 0              1              0                          1
##   Hit_subClassOf_ChildClass Hit_InstanceOf_ParentClass
## 1                         6                          1
## 2                         1                          1
## 3                         0                          0
##   Hit_InstanceOf_ChildClass Count_Of_Label Count_Of_AltLabel
## 1                         0              1                 0
## 2                         0              1                 0
## 3                         0              1                 0
##   Count_Of_subClassOf_ParentClass_Label
## 1                                     2
## 2                                     1
## 3                                     1
##   Count_Of_subClassOf_ParentClass_altLabel Count_Of_subClassOf_ChildClass_Label
## 1                                        0                                    6
## 2                                        0                                    1
## 3                                        0                                    0
##   Count_Of_subClassOf_ChildClass_altLabel Count_Of_InstanceOf_ParentClass_Label
## 1                                       0                                     1
## 2                                       0                                     1
## 3                                       0                                     0
##   Count_Of_InstanceOf_ParentClass_altLabel Count_Of_InstanceOf_ChildClass_Label
## 1                                        0                                    0
## 2                                        0                                    0
## 3                                        0                                    0
##   Count_Of_InstanceOf_ChildClass_altLabel
## 1                                       0
## 2                                       0
## 3                                       0
#View the data
#agTableDT(fm, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)

2.2.3 Extract only results with label and upper-level class

fm1 <- fm[c(fm$Hit_Label > 0),]
fm2 <- fm1[c(fm1$Hit_ALL > 0),]

#dim(fm); dim(fm1); dim(fm2)

2.2.4 Assigning Label information to QID

Lab01 <- fm2$LABEL

#Check Query
CkeckQuery_agWD_Alt_Wikidata(Lab01[1])
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?subject  
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## optional{ ?subject rdfs:label "acute lymphocytic leukemia"@en. }
## optional{ ?subject skos:altLabel "acute lymphocytic leukemia"@en. }
## }
## ```````````````````````````````````````````

2.2.5 Retry SPARQL by QID

#View query
CkeckQuery_agCount_ID_Num_Wikidata_QID_P279_P31(QID[1])
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix:
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT  (count(distinct ?parentClass) as ?Count_Of_ParentClass)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q180664 wdt:P279 ?parentClass.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?childClass) as ?Count_Of_ChildClass)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?childClass wdt:P279 wd:Q180664.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance) as ?Count_InstanceOf)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q180664 wdt:P31 ?instance.
## }
## ```````````````````````````````````````````
## SELECT  (count(distinct ?instance) as ?Count_Has_Instance)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## ?instance wdt:P31 wd:Q180664.
## }
## ```````````````````````````````````````````
#create an empty variable
QID_res <- c()

#Try SPARQL with QID
for(n in 1:length(Lab01)){
QID_res[[n]] <- agCount_ID_Num_Wikidata_QID_P279_P31(QID[n])
}

#convert list to data frame
QID_res2 <- ListDF2DF(QID_res)

#check results
head(QID_res2)
dim(QID_res2)
colnames(QID_res2)

#All
table(QID_res2$Hit_All)
table(QID_res2$Hit_All > 0)
table(QID_res2$Hit_All_Parent > 0)
table(QID_res2$Hit_All_Child > 0)

#View the results
#agTableDT(QID_res2, Width = "100px", Transpose = TRUE, AutoWidth=FALSE)

2.3 SPARQL query (2) Excluding the particular relations

This step search for neighboring entities and properties, and then count their presence or absence. If the particular entity exists in the neighbor, the search entity is excluded. It is shown in Figure 4.

Ex. examples of neighboring entities - Family name (wd:Q101352) - movie (wd:Q11424)

Ex. examples of neighboring properties - sex or gender (wdt:P21) - located in the administrative territorial entity (wdt:P131)

Exclusion of non-applicable entities by relationships with the adjacent entity and the property

Figure 5: Exclusion of non-applicable entities by relationships with the adjacent entity and the property

#For neighboring entities
#Check query
CkeckQuery_agCount_ID_Prop_Obj_Wikidata_vP( Entity_ID=QID[1], Object="wd:Q101352" )
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix: 
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT (count(distinct ?p) as ?Count)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q180664 ?p wd:Q101352.
## } 
## ```````````````````````````````````````````
#create an exclusion QID list without "wd:"
ExcluQ <- c("Q101352", "Q11424")
NumQ <- length(ExcluQ)
QIDdf <- data.frame(QID=QID)

#run SPARQL
for(m in seq_len(NumQ)){
#print(ExcluQ[m])

res <- c()
for(n in seq_len(length(QID))){
res[[n]] <- agCount_ID_Prop_Obj_Wikidata_vP(Entity_ID=QID[n], 
                                            Object=paste0("wd:", ExcluQ[m]))
}
res1 <- ListDF2DF(res)
eval(parse(text=paste0("QIDdf$", ExcluQ[m], " <- c(as.numeric(unlist(res1)) > 0)")))
}

#View the result
agTableKB(QIDdf)
QID Q101352 Q11424
wd:Q180664 FALSE FALSE
wd:Q5113976 FALSE FALSE
wd:Q55790812 FALSE FALSE
#For neighboring properties
#Check query
CkeckQuery_agCount_ID_Prop_Obj_Wikidata_vO( Entity_ID=QID[1], Property="wdt:P21")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix: 
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT (count(distinct ?o) as ?Count)
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q180664 wdt:P21 ?o.
## } 
## ```````````````````````````````````````````
#create an exclusion list without "wdt:"
ExcluP <- c("P21", "P131")
NumP <- length(ExcluP)

#run SPARQL
for(m in seq_len(NumP)){
print(ExcluP[m])

res <- c()
for(n in seq_len(length(QID))){
res[[n]] <- agCount_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], 
                                            Property=paste0("wdt:", ExcluP[m]))
}
res1 <- ListDF2DF(res)
eval(parse(text=paste0("QIDdf$", ExcluP[m], " <- c(as.numeric(unlist(res1)) > 0)")))
}

#view the result
agTableKB(QIDdf)

2.4 SPARQL query (3) Examining the upper-level class relations

2.4.1 instanceOf

# instanceOf (wdt:P31)
CkeckQuery_agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P31")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix: 
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?o ?oLabelj ?oLabele 
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q55790812 wdt:P31 ?o .
## ?o rdfs:label ?oLabelj . filter(LANG(?oLabelj) = "ja").
## ?o rdfs:label ?oLabele . filter(LANG(?oLabele) = "en").
## }
## ```````````````````````````````````````````
#create an empty variable
res3 <- c()

#run SPARQL
for(n in seq_len(length(QID))){
res3[[n]] <- agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P31")
}

2.4.2 subClassOf

# subClassOf (wdt:P279)
CkeckQuery_agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P279")
## EndPoint:
## http://kozaki-lab.osakac.ac.jp/agraph/NEDO_pj
## Prefix: 
## PREFIX wd: <http://www.wikidata.org/entity/>
## PREFIX wdt: <http://www.wikidata.org/prop/direct/>
## PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
## PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
## PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
## PREFIX owl: <http://www.w3.org/2002/07/owl#>
## PREFIX dct: <http://purl.org/dc/terms/>
## PREFIX foaf: <http://xmlns.com/foaf/0.1/>
## PREFIX wikibase: <http://wikiba.se/ontology#>
## ```````````````````````````````````````````
## SELECT distinct ?o ?oLabelj ?oLabele 
## From <http://wikidata_nearly_full_201127> 
## WHERE {
## wd:Q55790812 wdt:P279 ?o .
## ?o rdfs:label ?oLabelj . filter(LANG(?oLabelj) = "ja").
## ?o rdfs:label ?oLabele . filter(LANG(?oLabele) = "en").
## }
## ```````````````````````````````````````````
#create an empty variable
res4 <- c()

#run SPARQL
for(n in seq_len(length(QID))){
res4[[n]] <- agWD_ID_Prop_Obj_Wikidata_vO(Entity_ID=QID[n], Property="wdt:P279")
}

#convert list to data.frame
res3b <- ListDF2DF(res3)
res4b <- ListDF2DF(res4)
res <- rbind(res3b, res4b)

#remove rows with NA on "o" col
(res.na <- res[!is.na(res$o),])

#View the result
#agTableDT(res.na, Width = "100px", Transpose = FALSE, AutoWidth=FALSE)

2.5 SPARQL query (4) Searching for the upper-level concepts

2.5.1 Obtaining the upper-level concepts from the input terms

#create a new folder
if(!dir.exists("03_Short_Out")){dir.create("03_Short_Out")}

#create an empty variable
res5 <- c()

#run SPARQL; search the upper-level classes
for(n in 1:length(QID)){
  message(n)
  res5[[n]] <- PropertyPath_GraphUp_Wikidata(Entity_ID = QID[n], 
                                             Depth = 30)  
}

#check results
head(res5[[1]])
agTableDT(res5[[1]])

#Count rows
checkNrow_af(res5)

#Detect loop
checkLoop_af(res5)

#Save
saveRDS(res5,
        file="./03_Short_Out/Individual_upGraph.Rdata",
        compress = TRUE)

An alternative way,

#run SPARQL with purrr::map function
res5m <- purrr::map(QID, 
                    PropertyPath_GraphUp_Wikidata, 
                    Depth = 30)

#check results
#Count rows
checkNrow_af(res5m)

#Detect loop
checkLoop_af(res5m)

2.5.2 Individual network diagrams

#create a new folder
if(!dir.exists("03_Short_Out_vis")){dir.create("03_Short_Out_vis")}

#create networks
for(n in 1:length(res5)){
#n <- 1
a <- agIDtoLabel_Wikidata(Entity_ID = QID[n])
if(is.na(a[,2])){a[,2] <- a[,3]}

Lab00 <- paste(a[,c(2, 1)], collapse = ".")
FileName <- paste0("agVisNetwork_", Lab00,"_", format(Sys.time(), "%y%m%d"),".html")

#run the network creation
agVisNetwork(Graph=res5[[n]], 
             Selected=Lab00, 
             Browse=FALSE, 
             Output=TRUE,
             FilePath=FileName)
Sys.sleep(1)

filesstrings::file.move(files=FileName,
                        destinations="./03_Short_Out_vis",
                        overwrite = TRUE)

Name <- paste0("./agVisNetwork_", 
               formatC(n, flag="0", width=4), 
               "_", Lab00, "_files")
if(dir.exists(Name)){file.remove(Name)}
}

#View the results
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[1]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[2]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern=".html")[3]))

2.5.3 Merged network diagrams

#Merge their graphs to one graph
res6 <- ListDF2DF(res5)

#check NAs
table(is.na(res6))

#Delete deplicates
res6d <- Exclude_Graph_duplicates(input=res6)

#check dim
dim(res6); dim(res6d)

#Save
saveRDS(res6d,
        file="./03_Short_Out/Merged_upGraph.Rdata",
        compress = TRUE)

#run the network creation
if(TRUE){
FileName <- paste0("agVisNetwork_Merged", "_", 
                   format(Sys.time(), "%y%m%d"),".html")
agVisNetwork(Graph=res6d,
             Browse=FALSE,
             Output=TRUE,
             FilePath=FileName)
filesstrings::file.move(files=FileName,
                        destinations="./03_Short_Out_vis",
                        overwrite = TRUE)
}

#View the results
#browseURL(paste0("./03_Short_Out_vis/", FileName))
Merged network diagrams for search terms related to leukemia

Figure 6: Merged network diagrams for search terms related to leukemia

2.5.4 Identification of the common upper-level entities using individual networks

The common upper-level concept is defined based on the edge list of triples obtained above.

##Graph data without the uplicates
#Number of entities
(E01 <- length(unique(c(res6d$subject, res6d$parentClass))))
#Number of labels
(E02 <- length(unique(c(res6d$subjectLabel, res6d$parentClassLabel))))
#Number of Triples
(E03 <- length(unique(res6d$triples)))

#Gathering the parent concepts
upEntity <- unlist(purrr::map(res5, function(x){unique(x$parentClass)}))

#calculate the frequency of common entities
Count_upEntity_DF <- countCommonEntities(upEntity)

#Count and view table
agTableDT(Count_upEntity_DF, Transpose = F, AutoWidth = FALSE)

#Count Freq
table(Count_upEntity_DF$Freq)

#extarct parentClass & parentClassLabel from the merged dataset
Dat <- data.frame(res6d[,c(colnames(res6d) == "parentClass" | 
                          colnames(res6d) == "parentClassLabel")], 
                  stringsAsFactors = F)
head(Dat)

#Delete the deplicates
Dat0 <- Exclude_duplicates(Dat, 1)
head(Dat0)
dim(Dat); dim(Dat0)

#define the common upper-level entities
dim(Count_upEntity_DF); dim(Dat0)
head(Count_upEntity_DF); head(Dat0)
Count_upEntity_DF2 <- Cutoff_FreqNum(input1=Count_upEntity_DF, 
                                     input2=Dat0, 
                                     By="parentClass", 
                                     Sort="Freq", 
                                     FreqNum=2)

#check the results
head(Count_upEntity_DF2, n=10)
table(Count_upEntity_DF2$Freq)

#save
saveRDS(Count_upEntity_DF2,
        file = "./03_Short_Out/Count_upEntity_DF2.Rdata", compress = TRUE)
readr::write_excel_csv(Count_upEntity_DF2,
                       file="./03_Short_Out/Count_upEntity_DF2.csv")
#Count_upEntity_DF2 <- readRDS(file = "./03_Short_Out/Count_upEntity_DF2.Rdata")

#Calculation of inclusion rate
QID <- QIDdf$QID

##QID
qid <- unique(res6d$subject, res6d$parentClass)
b <- setdiff(QID, qid)
b; length(b)

##rdfsLabel
#RdfsLabel <- unique(res6d$subjectLabel, res6d$parentClassLabel)

2.5.5 Results for the common upper-level entities

FileName <- paste0("./FrequencyGraph_", format(Sys.time(), "%y%m%d_%H%M"),".html")

pc_plot(Count_upEntity_DF2, 
        SaveFolder="03_Short_Out_vis", 
        FileName=FileName, 
        IDnum=3)

#View the results
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern="FrequencyGraph_")[2]))
#browseURL(paste0("./03_Short_Out_vis/", dir("03_Short_Out_vis", pattern="FrequencyGraph_")[1]))

2.6 Extraction of class hierarchies based on common entities

2.6.1 Set-up parameters

#Individual graphes
eachGraph <- readRDS("./03_Short_Out/Individual_upGraph.Rdata")
head(eachGraph[[1]])
sapply(eachGraph, dim)

#Search entities
(list1a <- readRDS("./02_Short_Out/SearchEntities.Rdata"))

head(list1a)
any(list1a == "wd:Q35120")

#Common entities
list2a <- readRDS("./03_Short_Out/Count_upEntity_DF2.Rdata")

head(list2a)
dim(list2a)

list2b <- unique(list2a$parentClass)
head(list2b)
any(list2b == "wd:Q35120")

#Remove Q35120 from the common list.
list2b <- list2b[list2b != "wd:Q35120"]

#Inclusion of list1a and list2b
table(list1a %in% list2b)
table(list2b %in% list1a)

2.6.2 Calculation

system.time(
  SearchNum <- agGraphAnalysis(eachGraph, 
                               list1a, 
                               list2b, 
                               LowerSearch=TRUE)
  )

head(SearchNum)
table(SearchNum$Levels)
sum(table(SearchNum$Levels))
table(SearchNum$Levels)
table(!is.na(SearchNum[,2]))

2.7 XXX

2.7.1 YYY

Session information

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] agGraphSearch_0.99.1 SPARQL_1.16          RCurl_1.98-1.3      
## [4] XML_3.99-0.6         EBImage_4.32.0       BiocStyle_2.18.1    
## 
## loaded via a namespace (and not attached):
##  [1] locfit_1.5-9.4      lattice_0.20-41     tidyr_1.1.3        
##  [4] visNetwork_2.0.9    fftwtools_0.9-11    png_0.1-7          
##  [7] assertthat_0.2.1    digest_0.6.27       utf8_1.2.1         
## [10] R6_2.5.0            tiff_0.1-8          filesstrings_3.2.2 
## [13] evaluate_0.14       httr_1.4.2          ggplot2_3.3.3      
## [16] highr_0.9           pillar_1.6.0        rlang_0.4.10       
## [19] lazyeval_0.2.2      data.table_1.14.0   jquerylib_0.1.4    
## [22] DT_0.18             rmarkdown_2.7       readr_1.4.0        
## [25] stringr_1.4.0       htmlwidgets_1.5.3   franc_1.1.3        
## [28] igraph_1.2.6        munsell_0.5.0       compiler_4.0.2     
## [31] xfun_0.22           pkgconfig_2.0.3     BiocGenerics_0.36.1
## [34] htmltools_0.5.1.1   tidyselect_1.1.0    tibble_3.1.1       
## [37] bookdown_0.22       viridisLite_0.4.0   fansi_0.4.2        
## [40] crayon_1.4.1        dplyr_1.0.5         bitops_1.0-7       
## [43] grid_4.0.2          jsonlite_1.7.2      formattable_0.2.1  
## [46] gtable_0.3.0        lifecycle_1.0.0     DBI_1.1.1          
## [49] magrittr_2.0.1      scales_1.1.1        stringi_1.5.3      
## [52] bslib_0.2.4         ellipsis_0.3.2      vctrs_0.3.8        
## [55] generics_0.1.0      tools_4.0.2         glue_1.4.2         
## [58] purrr_0.3.4         hms_1.0.0           jpeg_0.1-8.1       
## [61] networkD3_0.4       abind_1.4-5         parallel_4.0.2     
## [64] yaml_2.2.1          colorspace_2.0-0    BiocManager_1.30.12
## [67] strex_1.4.2         plotly_4.9.3        knitr_1.33         
## [70] sass_0.3.1